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Preface 



Many complex systems found in nature can be viewed as function optimizers. In 
particular, they can be viewed as such optimizers of functions in extremely high- 
dimensional spaces. Given the difficulty of performing such high-dimensional op- 
timization with modern computers, there has been a lot of exploration of computa- 
tional algorithms that try to emulate those naturally-occurring function optimizers. 
Examples include simulated annealing (SA [15, 18]), genetic algorithms (GAs) and 
evolutionary computation [2, 3, 9, 1 1, 20-22, 24, 28]. The ultimate goal of this work 
is an algorithm that can, for any provided high-dimensional function, come close 
to extremizing that function. Particularly desirable would be such an algorithm that 
works in an adaptive and robust manner, without any explicit knowledge of the form 
of the function being optimized. In particular, such an algorithm could be used for 
distributed adaptive control — one of the most important tasks engineers will face in 
the future, when the systems they design will be massively distributed and horribly 
messy congeries of computational systems. 

Unfortunately, no optimization algorithm outperforms random search over the 
space of all optimization functions [26,27]. If the algorithm is not well matched to 
the function being optimized, it may do even worse than random search. Indeed, it 
has not even been established that the optimization algorithms in nature outperform 
random search. As an example, the algorithm of natural selection is non- teleological; 
it has no optimization goal whatsoever, and any ability in achieving such a goal is a 
side effect. Moreover, to the degree that it does achieve a particular goal, at least a 
large part of its success is due to its brute force massive parallelism. The number of 
genomes mutating and recombining simultaneously in the terrestrial biosphere may 
exceed Avogadro’s number. 

To understand what kinds of optimization functions are well matched to these 
naturally occurring complex systems, we need a unifying way of viewing those sys- 
tems. One feature shared by many of these naturally occurring complex systems is 
that they can be viewed as though the underlying variables were self-interested adap- 
tive agents. In some of these systems this is overt, the underlying variables being 
controlled by agents engaged in a noncooperative game, their equilibrium joint state 
(hopefully) maximizing the provided “world utility” function G [1,4,5, 13, 14, 16,23]. 
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Examples of such systems are auctions and clearing of markets. Typically in the 
computational algorithms inspired by such systems, each agent is a separate ma- 
chine learning algorithm [7, 8, 10, 17], e.g., a reinforcement learning (RL) algo- 
rithm [25,29]. 

Other complex systems found in nature that have inspired function-maximization 
algorithms are not usually considered in terms of noncooperative game theory. How- 
ever even these systems are very often viewed as though their underlying variables 
were controlled by self-interested agents. Examples include spin glasses (agents are 
particles trying to extremize their free energies) genomes undergoing neo-Darwinian 
natural selection (each genome is an agent trying to maximize its reproductive fit- 
ness), and eusocial insect colonies. These have been translated into SA, GAs and 
evolutionary computation, and swarm intelligence [6, 12, 19], respectively. 

One crucial issue concerning the systems that are explicitly noncooperative 
games is whether the payoff function g ^ of each player rj is sufficiently sensitive 
to the coordinate rj controls in comparison with the other coordinates, so that it is 
feasible for rj to discern how to set its coordinates to achieve high payoff. A second 
crucial issue is the need for all of the g ^ to be aligned with G, so that as the play- 
ers individually learn how to increase their payoffs, G also increases. A particularly 
important issue with collectives that are not explicitly noncooperative game is the 
exploration/exploitation tradeoff. 

Clearly then, both to be able to better understand the behavior of these natural 
systems, as well as perform high-dimensional optimization, we need a thorough over- 
arching understanding of collectives, that is in understanding systems in which there 
is a provided world utility, and in which at least some of the system’s variables can 
be viewed as self-interested adaptive agents. This book, compiling recent research 
from the fields of physics, economics, computer science and biology, is a first foray 
into such an understanding. 

In assembling this book, we resisted the temptation to group chapters into 
sections connected to well-established fields (e.g., economics-based approaches, 
physics-based approaches). Though such sections may provide some order, they also 
would obscure the underlying theme of the book: the emerging field of collectives is 
connected to many current disciplines, but cannot be wholly captured by one or even 
a simple combination of those existing fields. Furthermore, such sections would sug- 
gest that a closer similarity, or a certain homogeneity among chapters within a sec- 
tion exists, which is not generally true. Chapter 13, for example, could be at a broad 
brush grouped with other chapters on computational economies, though doing so 
would hide the groundbreaking work of the authors in bringing statistical mechanics 
concepts and computational market approaches into the engineering and computer 
science communities. Finally such sections would make it difficult to classify work 
that uses ideas from more than one field. Chapter 10 could fit within a physics section 
due to its analysis techniques, a biology section due to its first approach, and finally 
an economics section due to its second approach. Any of these choices though would 
not only hide the nature of the work, but also de-emphasize the practical and engi- 
neering contributions of the work. 
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Instead, we let the chapters stand on their own as examples of work in collec- 
tives. The first chapter in this volume provides a broad survey of the emerging field 
of collectives. By formally defining what constitutes a collective, and discussing their 
various properties, it aims to provide a common language with which to approach this 
field. This introduction is followed by a brief overview of the various fields (multi- 
agent systems, mechanism design, game theory, statistical physics etc.) that are re- 
lated to collectives. The second chapter provides a theory of the “collective intelli- 
gence” framework, which focuses on the inverse problem of initializing and updating 
the collective’s structure (including the agents’ utility functions) so as to induce high 
values of the world utility. Through formal mathematics this chapter provides both 
an analysis of existing techniques and suggests new agent utility functions in collec- 
tives. 

Chapter 3 discusses how mechanism design can be used in the control of a col- 
lective. This chapter addresses the challenges traditional mechanism design faces in 
large distributed systems, and provides a learnable mechanism design framework 
drawing parallels with the collective intelligence framework. It is an extremely fresh 
look at traditional concepts. Chapter 4 focuses on cases where the collective’s overall 
goal (i.e., the world utility) is based on the agents’ utilities, and the designers have 
fewer options in inducing behavior that would benefit the collective. This interesting 
situation is detailed with an application example from the Internet. 

Chapter 5 analyzes how the efficiency (the world utility is indirectly set as the 
variance of the agents’ private utilities, hence the name “efficiency”) of a system de- 
pends on the learning properties of the agents in a collective using statistical physics. 
The minority game (also called the “El Farol Bar problem”) is used as the testbed. 
Chapter 6 focuses on the difficult task of predicting and controlling catastrophic 
changes in a collective. It shows how large macroscopic (e.g., system level) changes 
are encoded in the microscopic (e.g., agent level) dynamics. This leads to either de- 
sign stage modifications to a collective to avoid catastrophic changes or run-time 
monitoring to alleviate the effect of such changes. Chapter 7 focuses on the effect 
of communication on the evolution of a collective. Using the minority game as a 
testbed, it shows that local social networks can override global information and thus 
totally change the evolution of a collective. 

Chapter 8 provides the only work in this book which focuses solely on human 
agents. In this chapter, minority game experiments conducted on students suggest 
that many of the results obtained on computer agents (e.g., often simpler strategies 
yield better results for the agents) also hold for human agents. 

Chapter 9 presents techniques designing self-reconfigurable robots. This fasci- 
nating problem, where many simple modules can form complex shapes presents a 
unique distributed control problem. This chapter addresses those challenges by us- 
ing local rules only, providing a decentralized solution to this problem. Chapter 10 
presents two approaches to the distributed control of collectives. Both a biology- 
inspired solution based on local interactions leading to global behavior and a market- 
based mechanism are presented in the context foraging in a group of robots and re- 
source allocation. Chapter 1 1 provides an analysis of how the information presented 
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to the agents and the heterogeneity of the agents influences the world utility, showing 
that local interactions can lead to higher world utility than global interactions. 

Chapter 12 provides a look at the inverse problem of designing collectives from a 
evolutionary algorithm viewpoint. It shows how certain selection and fitness-sharing 
methods used in coevolution can be used in the design of a collective. Finally Chapter 
13 provides a market-based “computational ecosystem” approach to collectives. It 
presents theory describing the dynamics of the agents, and experimentally shows 
that very sophisticated agents can lead to undesirable behavior. Then it shows that 
such undesirable global behavior can be prevented by applying local control. 

With the variety in both the approaches used by the authors and the types of 
problems selected, this book presents a broad view of the current state of the art in 
the field of collectives. We hope that this book along with the interaction it might 
engender among scientists from different disciplines will lead to the emergence of a 
new and exciting field of collectives and the design of complex systems. 



Moffett Field, CA Kagan Turner 

July 2003 David Wolpert 
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A Survey of Collectives 



Kagan Turner 1 and David Wolpert 2 



Summary. Due to the increasing sophistication and miniaturization of computational compo- 
nents, complex, distributed systems of interacting agents are becoming ubiquitous. Such sys- 
tems, where each agent aims to optimize its own performance, but there is a well-defined set of 
system-level performance criteria, are called collectives. The fundamental problem in analyz- 
ing and designing such systems is in determining how the combined actions of a large number 
of agents lead to “coordinated” behavior on the global scale. Examples of artificial systems 
that exhibit such behavior include packet routing across a data network, control of an array of 
communication satellites, coordination of multiple rovers, and dynamic job scheduling across 
a distributed computer grid. Examples of natural systems include ecosystems, economies, and 
the organelles within a living cell. 

No current scientific discipline provides a thorough understanding of the relation between 
the structure of collectives and how well they meet their overall performance criteria. Although 
still very young, research on collectives has resulted in successes in both understanding and 
designing such systems. It is expected that as it matures and draws on other disciplines re- 
lated to collectives, this field will greatly expand the range of computationally addressable 
tasks. Moreover, in addition to drawing on them, such a fully developed field of collective 
intelligence may provide insight into already established scientific fields, such as mechanism 
design, economics, game theory, and population biology. This chapter provides a survey of 
the emerging science of collectives. 



1 Just What Is a “Collective”? 

As computing power increases, becomes cheaper, and is packed into smaller units, 
a new computational paradigm based on adaptive distributed computing is emerg- 
ing. Whether used for control or optimization of complex engineered systems or the 
analysis of natural systems, this new paradigm offers new and exciting solutions to 
the problems of the twenty-first century. However, before the full strength of this 
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powerful computational paradigm can be harnessed, some fundamental issues need 
to be addressed. 

In this chapter we provide a survey of approaches to large distributed systems 
called collectives. A collective is a large system of agents, 3 where each agent has a 
private utility function it is trying to maximize using adaptive utility-maximizing 
algorithms, called agents, along a world utility function that measures the full sys- 
tem’s performance. 4 

Many fields provide partial solutions to the design and study of collectives. In 
particular, game theory [11, 19,30,87], mechanism design [82,87], and multiagent 
reinforcement learning [53,56, 112, 192] are fields that grapple with some of the is- 
sues encountered in the field of collectives. However, although these fields provide 
some of the ingredients required for a Full-fledged field of collectives, they fall short 
of providing a suitable starting point for the development of such a field. Further- 
more, merging concepts from one of these fields with another is generally cumber- 
some due to the various assumptions — rarely explicit — deeply rooted in each field. 
What is needed for the field of collectives to develop and mature is a common lan- 
guage describing the various properties of collectives, a set of desirable properties, a 
theoretical framework, and a set of problems that will provide good testing grounds 
for new ideas in this field. 



1.1 Distinguishing Characteristics of Collectives 

Collectives can be characterized through many different distinguishing characteris- 
tics. Because the chapters in this volume focus on various design and analysis aspects 
of collectives, we briefly synopsize some distinguishing characteristics of collectives. 
These include the presence or absence of a well-defined world utility function; the 
forward or inverse approach; the presence or absence of centralized control or com- 
munications; the presence or absence of a model; and scalability, robustness, and 
adaptivity. 

World Utility Function 

Having a well-defined world utility function that concerns the behavior of the entire 
distributed system is crucial in the study of collectives. Such a world utility func- 
tion provides an objective quantification of how well the system is performing. In 
that light, in a collective, we are not concerned with an unquantifiable “emergent” 
behavior of the system. Rather we are interested in how the system maximizes the 
specified world utility function (of course, nothing precludes the world utility from 
depending on the emergent behavior of the system, assuming such behavior can be 
quantified). 

3 We use the term agent to refer to the components of the system, although the various fields 
surveyed use different terminology (i.e., player in game theory). 

4 The world utility can be provided as part of the specifications of the system or “constructed” 
by the designer, as discussed later. 
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The most natural type of world utility is a provided utility, one that comes as 
part of the problem definition and specifies the overall performance criteria that the 
collective needs to meet. Examples of such world utilities include total throughput in 
a data network, total scientific information gathered by a team of deployables, total 
information downloaded by a constellation of satellites, the valuation of a company, 
or the percentage of available free energy exploited by an ecosystem. 

However, the lack of a provided world utility does not preclude a collective- 
based approach to a problem. In such a case, assuming the agents have some utility 
functions associated with them, a world utility can be constructed (e.g., construct a 
social welfare function in economics). Examples of such world utilities include sum 
of agent utilities, sum of agent utilities and variances, and the utility of the worst-off 
agent. Note that maximizing each of these constructed world utilities would result 
in different system behavior. What is particularly interesting in such problems is the 
relationship between the agents’ initial utility functions and the utility functions that 
they ought to pursue in order to maximize the constructed world utility function. 

Forward (Analysis) vs. Inverse (Design) Problem 

Whether it has a provided or constructed world utility, a collective can be approached 
from two very different perspectives: analysis or the forward problem and design or 
the inverse problem. 

The forward problem focuses on how the localized attributes of a collective 
induce global behavior and thereby determine system performance. Generally, this 
problem arises in the study of already existing complex systems and is most naturally 
applicable to biological systems or systems that can be viewed as such. Examples 
of such systems include ecosystems or a living cell, where in each case, the local 
interactions (species and organelles, respectively) lead to complex emergent behavior 
on a large scale. 

Engineered systems such as processes (e.g., the space shuttle maintenance and 
refurbishment process) or (economic) organizations, can also be viewed as forward 
problems in collectives. In those cases, the analysis approach can lead to predictive 
models and detect interactions among components of the system that may lead to 
breakdowns (e.g., determining whether a component considered “safe” can cause a 
critical malfunction when it interacts with another “safe” component). 

The inverse problem, on the other hand, arises when we wish to design a system 
to induce behavior that maximizes the world utility. Here, the designer either has 
the freedom to assign the private utility functions of the agents (e.g., determine what 
each satellite or router should be doing) or needs to design incentives that will be 
added to the preexisting private utilities of the agents (e.g., economics, where agents 
are humans). In either case, though, the focus is on guiding toward states where the 
world utility is high. 

Centralized Communication or Control 

Though not in the formal definition of a collective, many collectives are decentralized 
systems. With few exceptions, it is difficult, if not impossible, to have centralized 
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control in a collective, not only because reaching each agent may be problematic, 
but more fundamentally, because in many cases a centralized algorithm may not be 
able to determine what each agent should do. 

Similarly, though some amount of global communication (e.g., broadcasting) 
may be possible, in general there will be little to no centralized communication, 
where a small subset of agents not only communicates with all the other agents but 
communicates differently with each of those other agents. Establishing the amount 
of allowed (or possible) centralized communication and control will be one of the 
fundamental issues in a collective. 

Model-Based vs. Model-Free Approaches 

Another important characteristic of a collective is the presence or absence of a model 
describing the dynamics of the system. A model-based approach consists of: 

1. constructing a detailed model of the dynamics governing the collective; 

2. learning the function that maps the parameters of the model to the resulting dy- 
namics of the system (in practice, this step can involve significant hand-tuning); 
and 

3. (a) drawing conclusions about this system based on the model (forward problem) 
and (b) determining parameters of the model that will yield the desired behavior 
(inverse problem). 

A fundamentally different approach, however, is to dispense with building a 
model altogether, on the grounds that large complex systems are generally noisy, 
faulty, and often operate in nonstationary environments. In such cases, coming up 
with a detailed model that captures the dynamics in an accurate manner is often ex- 
traordinarily difficult. 

A model-free approach relies on agents “reacting” to the environment (e.g., 
through a reinforcement learning mechanism). As such they avoid explicitly mod- 
eling the system in which they operate, and they avoid the potentially infinite regress 
when one agent tries to model another’s behavior and the other agent is itself model- 
ing the first agent’s behavior. 

The model-based-vs.-model-free choice has significant consequences on how the 
system can adapt and scale up and on how lessons learned from one domain can 
map to another. A model-based approach may be the choice for domains where the 
designer can develop detailed models and have a moderate degree of control over 
the environment. However, in domains where detailed models are not available or 
where there is reason to believe changes in the environment can lead to significant 
deviations from any model, a model-free approach is preferable. 

Scalability 

One of the implicit defining properties of a collective is that it is a large system of 
distributed agents. As such, scalability is a fundamental property of any approach 
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that aims to study or design a collective. Although this does not preclude extending 
extant analysis and design tools appropriate for single (or small) systems to large 
systems, it does suggest that in most instances, new ways of approaching the problem 
are likely to be more appropriate (e.g., a game-theoretic equilibrium analysis for a 
million nanodevices is unlikely to provide useful insight into the behavior of the 
collective). 

Adaptivity 

Although scalability does not require that the system be adaptive, it provides a strong 
impetus to move in that direction. Any approach that allows adaptivity or learning 
will have a significant advantage over one that does not, simply because the larger a 
system, the more difficult it will be to know a priori all the “right moves” for each 
agent. 

Furthermore, the need for adaptivity extends beyond each agent in the collec- 
tive. Indeed, the structure of the collective itself (e.g., the communication channels 
among the agents and the agents’ utility function) in many cases is adaptive. In nat- 
ural collectives this system-level adaptivity is generally implicit (e.g., the interaction 
among species in an ecosystem or the relationship among employees in a company), 
whereas in artificial systems it must be built in. 

Robustness 

Another desirable property of a collective is that it be robust, i.e., that in order to 
reach good values of the world utility, the collective not require that many parameters 
be set “just right,” that each agent operate failure-free, and that their interaction be 
carefully constructed. Clearly, as the number of agents in a system increases and their 
interaction with one another and the environment becomes more complex, it will 
be increasingly difficult to predict conditions that will lead the system to maximize 
the world utility. It is therefore imperative that the collective be insensitive to the 
specific values of some parameters or the specific operation of a small subset of 
its agents (e.g., in general, the poor performance of one employee does not bring a 
company down, or the demise of a single individual does not result in the extinction 
of a species). 

1.2 Canonical Experimental Domains 

The previous section provided a list of distinguising characteristics of collectives. 
The usefulness of defining these characteristics is in their providing a common lan- 
guage for a field of collectives. For example, a particular instance of data routing in a 
telecommunications network can be characterized as “a model-free inverse problem 
involving a provided world utility function where there is limited broadcast informa- 
tion but no form of global control.” 

We now provide examples of both engineered and natural systems that are ide- 
ally suited to be studied as collectives. For each, we provide one or more world utility 
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functions, discuss how it can be approached (e.g., forward or inverse problem), and 

list the assumptions (e.g., is it model-based?) and restrictions (e.g., is global commu- 
nication possible?) present. 

• Control system for constellations of communication satellites: A candidate world 
utility for this problem is a measure of (potentially importance-weighted) infor- 
mation transferred. It is an example of an inverse problem, where centralized 
communication or control is likely to be difficult or impossible due to physical 
constraints (e.g., time lag) and where a model of the data flow is likely to be 
inadequate. 

• Control system for constellations of planetary exploration vehicles: A potential 
world utility for such a problem is a measure of the quality of scientific data 
collected. Although this can be viewed as an example of an inverse design prob- 
lem (as with constellations of satellites), it can also be approached as a forward 
problem, particularly if the vehicles have characteristics that cannot be altered 
(e.g., vehicles are built and we are confronted with the problem of predicting the 
behavior of the collective). 

• Control system for routing over a communication network: An obvious world 
utility for this problem is the total throughput of the communication network. 
Centralized communication or control in such a network is all but impossible, 
but some amount of broadcast information can filter its way to all the agents at 
regular time intervals. As an inverse problem, one would be required to design 
the private utility functions of the agents. As a forward problem on an already 
functioning network, one could determine the stress points of the system or the 
states that would cause the most congestion in the network. 

• Air space management: Given a problem specification where there is some lee- 
way in modifying the course and speed of airplanes, a potential world utility is the 
total delay at airports. The system designers are faced with the inverse problem 
of determining the incentives for the agents (whether they be pilots or air traffic 
controllers) so that their behavior (e.g., arrival times in the airport’s airspace) op- 
timizes the world utility. This is a case where global communication is possible, 
but global control is not. 

• Managing a power grid: A world utility based on the efficiency of the grid would 
be a good starting point for an inverse problem, involving some degree of central- 
ized communication or control. An alternative world utility may be robustness. 
In such a case a forward problem would involve finding how quickly the system 
responds to certain disturbances and how the system interactions can be modified 
to limit the propagation of those disturbances. 

• Job scheduling across a computational grid: A candidate world utility is the 
efficiency in processing the jobs entering the system. This problem is very similar 
to managing a power grid but provides a glimpse of the inverse problem: How 
should one set the rewards of the computational nodes so that they process the 
highest number of jobs collectively? A model-free solution involving learners at 
the computational nodes would be based on limited global communication. 
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• Control of the elements of a nanocomputer: A potential world utility for this 
problem is how well certain computations are carried out by the nanocomputer. 5 
In an inverse problem, one would focus on determining the structure of the adap- 
tive system, which would lead the agents to perform the desired computations. 
A particular instance of an inverse problem of this nature is the selection of sub- 
sets of faulty devices, where the world utility is the total aggregate error of the 
selected devices. 

• Study of a protocell: A potential world utility for this problem is the length of 
time the protocell 6 maintains its functionality. As a forward problem, this prob- 
lem consists of modeling the behavior of the system based on the organelles and 
their functions and interactions. With more leeway in the definition of the func- 
tions the organelles perform, one can view this as an interesting inverse problem: 
What should the organelles try to achieve to maintain the structure and function- 
ality of the protocell? 

• Study and design of an ecosystem: One world utility for the study of an ecosystem 
is the total biomass of the ecosystem. In a model-based forward problem, one 
can study the effect of various interactions on the world utility. Alternatively, as 
an inverse problem, one can investigate how to design an ecosystem that will 
provide the best sustainable biodiversity for a given mass (e.g., for a long-term 
space mission). 

• Design of incentives in a company: A “simple” world utility for a company is the 
valuation of the company (share price times the number of outstanding shares). 
The inverse problem consists of determining how to design incentives that will 
induce the company’s valuation to go up (e.g., what set of salaries, benefits, and 
stock options will induce the employees to take actions that will benefit the cor- 
poration). 

All of these problems share the property that they are inherently distributed sys- 
tems where the interactions among the agents lead to complex behavior. Although 
each can be approached by conventional methods, how those methods need to be 
modified to suit the particular application will be different in each case. The aim of 
this chapter is to both accentuate the similarities among these problems and to high- 
light the need for a general approach, which would address all these problems within 
the same framework. 



2 Review of the Literature Related to Collectives 

There are many approaches to analyzing and designing collectives that do not exactly 
meet the needs of a “field of collectives” yet provide some part of the equation. The 
rest of this section consists of brief presentations of some of these approaches and 
characterizes them in terms of the properties of collectives discussed earlier. 

5 A nanocomputer is a computer with nano-scale components. 

6 A protocell is a vessicle lacking conventional genetic material and organelles, especially 
an artificially constructed one used to investigate cellular homeostasis. 
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2.1 Artificial Intelligence and Machine Learning 

There is an extensive body of work in artificial intelligence (AI) and machine learn- 
ing related to the design of collectives. Indeed, one of the most famous speculative 
works in the field can be viewed as an argument that AI should be approached as a 
design of collectives problem [163]. We will discuss some topics relevant to collec- 
tives from this domain. 



Distributed Artificial Intelligence 

The field of distributed artificial intelligence (DAI) has arisen as more traditional 
AI tasks have migrated toward parallel implementation. The most direct approach to 
such implementations is to directly parallelize AI production systems or the underly- 
ing programming languages [79, 189]. An alternative and more challenging approach 
is to use distributed computing, where not only the individual reasoning, planning, 
and scheduling AI tasks are parallelized, but there are different modules with differ- 
ent such tasks, concurrently working toward a common goal [1 18, 1 19, 143]. 

In a DAI, one needs to ensure that the task has been modularized in a way that 
improves efficiency. Unfortunately, this usually requires a central controller whose 
purpose is to allocate tasks and process the associated results. Moreover, designing 
that controller in a traditional AI fashion often results in brittle solutions. Accord- 
ingly, there has recently been a move toward both more autonomous modules and 
fewer restrictions on the interactions among the modules [194]. 

Despite this evolution, DAI maintains the traditional AI concern with a prefixed 
set of particular aspects of intelligent behavior (e.g., reasoning, understanding, and 
learning) rather than on their cumulative character. As the idea that intelligence may 
have more to do with the interaction among components started to take shape [41,42], 
focus shifted to concepts (e.g., multiagent systems) that better incorporated that idea 
[ 121 ]. 



Multiagent Systems 

The field of multiagent systems (MAS) is concerned with the interactions among the 
members of such a set of agents [40,92, 121,204,222], as well as the inner workings 
of each agent in such a set (e.g., their learning algorithms) [36-38]. As in computa- 
tional ecologies and computational markets (discussed later), a well-designed MAS 
is one that achieves a global task through the actions of its components. The associ- 
ated design steps involve [121]: 

1 . decomposing a global task into distributable subcomponents, yielding tractable 
tasks for each agent; 

2. establishing communication channels that provide sufficient information to each 
of the agents for it to achieve its task, but that are not too unwieldly for the 
overall system to sustain; and 
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3. coordinating the agents in a way that ensures that they cooperate on the global 
task, or at the very least does not allow them to pursue conflicting strategies in 
trying to achieve their tasks. 

Step 3 is rarely trivial; one of the main difficulties encountered in MAS design 
is that agents act selfishly and artificial cooperation structures have to be imposed 
on their behavior to enforce cooperation [13]. An active area of research, which 
holds promise for addressing the design of collectives problem, is to determine how 
selfish agents’ “incentives” have to be engineered to avoid problems such as the 
tragedy of the commons (TOC) [209]. (This work draws on the economics literature, 
which we review separately later.) When simply providing the right incentives is not 
sufficient, one can resort to strategies that actively induce agents to cooperate rather 
than act selfishly. In such cases, coordination [205], negotiations [135], coalition 
formation [193, 195, 249], or contracting [3] among agents may be needed to ensure 
that they do not work at cross purposes. 

Unfortunately, all of these approaches share with DAI and its offshoots the prob- 
lem of relying on hand-tailoring and therefore are difficult to scale and often non- 
robust. In addition, except as noted in the next subsection, they involve little to no 
adaptivity, and therefore the constituent computational elements are usually not as 
robust as they need to be to provide the foundation for the field of collectives. 

Reinforcement Learning 

The maturing field of reinforcement learning (RL) provides a much needed tool for 
the types of problems addressed by collectives. The goal of an RL algorithm is to de- 
termine how, using those reward signals, the agent should update its action policy to 
maximize its utility [123,220,221,232]. Because RL generally provides model-free 7 
and “online” learning features, it is ideally suited for the distributed environment 
where a “teacher” is not available and the agents need to learn successful strategies 
based on “rewards” and “penalties” they receive from the overall system at various 
intervals. It is even possible for the learners to use those rewards to modify how they 
learn [199,200]. 

Although work on RL dates back to Samuel’s checker player [191], relatively 
recent theoretical [232] and empirical results [56,224] have made RL one of the most 
active areas in machine learning. Many problems, ranging from controlling a robot’s 
gait to controlling a chemical plant to allocating constrained resources, have been 
addressed with considerable success using RL [97, 114, 166, 186,247]. In particular, 
the RL algorithms TD(X) (which rates potential states based on a value function) 
[220] and learning (which rates action-state pairs) [232] have been investigated 
extensively. A detailed investigation of RL is available in [123, 221, 232]. 

Intuitively, one might hope that RL would help us solve the distributed control 
problem, because RL is adaptive and, in general, mode-free. However, by itself, con- 
ventional single-agent RL does not provide a means for controlling large distributed 
systems. The problem is that the space of possible action policies for such systems 

7 There are some model-based variants of traditional RL. See, for example, [8]. 
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is too big to be searched. So although powerful and widely applicable, solitary RL 
algorithms will not generally perform well on large distributed heterogeneous prob- 
lems. It is, however, natural to consider deploying many RL algorithms rather than a 
single one for these large distributed problems. 

Reinforcement Learning-Based Multiagent Systems 

Because it requires neither explicit modeling of the environment nor a “teacher” that 
provides the “correct” actions, the approach of having the individual agents in a MAS 
use RL is well- suited for MASs deployed in domains where one has little knowledge 
about the environment or other agents. There are two main approaches to designing 
such MASs: 

1. One has “team game agents” that don’t know about each other and whose RL 
rewards are given by the performance of the entire system (so the joint actions 
of all other agents form an “inanimate background” contributing to the reward 
signal each agent receives). 

2. One has “social agents” that explicitly model each other and take each others’ 
actions into account. 

Both 1 and 2 can be viewed as ways to (try to) coordinate the agents in a MAS in a 
robust fashion. 

Team game agents: MASs with team game agents have been successfully applied 
to a multitude of problems [56,96, 107, 192, 198]. However, scaling to large systems 
is a major issue with team game agents. The problem is that each agent must be 
able to discern the effect of its actions on the overall performance of the system, 
because that performance constitutes its reward signal. However, as the number of 
agents increases, the effects of any one agent’s actions (signal) will be swamped by 
the effects of other agents (noise), making the agent unable to leam well, if at all. In 
addition, of course, team game agents cannot be used in situations lacking centralized 
calculation and broadcast of the single global reward signal. 

Social agents: MASs whose agents take the actions of other agents into account 
synthesize RL with game-theoretic concepts (e.g., Nash equilibrium). They do this to 
try to ensure that the overall system both moves toward achieving the overall global 
goal and avoids often deleterious oscillatory behavior [53, 85, 1 1 1-113]. To that end, 
the agents incorporate internal mechanisms that actively model the behavior of other 
agents. In general, this approach involves hand-tailoring for the problem, and there 
are some well- studied domains (the El Farol Bar problem) in which such modeling 
is self-defeating [5,238]. 

2.2 Game Theory 

Game theory is the branch of mathematics concerned with formalized versions of 
“games,” in the sense of chess, poker, nuclear arms races, and the like [11, 19,30,66, 
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73, 87, 148, 207]. It is perhaps easiest to describe it by loosely defining some of its 
terminology, which we do here and in the next section. 

The simplest form of a game is that of the “noncooperative single-stage extensive- 
form” game, which involves the following situation: There are two or more agents 
(called players), each of which has a pre-specified set of possible actions that it can 
follow. (A “finite” game has finite sets of possible actions for all the players.) In 
addition, each agent i has a utility function (also called a payoff matrix for finite 
games). This maps any “profile” of the action choices of all agents to an associated 
utility value for agent i. (In a “zero-sum” game, for every profile, the sum of the 
payoffs to all the agents is zero.) 

The agents choose their actions in a sequence, one after the other. The structure 
determining what each agent knows concerning the action choices of the preceding 
agents is known as the “information set.” 8 Games in which each agent knows exactly 
what the preceding (leader) agent did are called Stackelberg games. 

In a multistage game, after all the agents choose their first action, each agent is 
provided some information concerning what the other agents did. The agent uses 
this information to choose its next action. In the usual formulation, each agent gets 
its payoff at the end of all of the game’s stages. 

An agent’s strategy is the rule it elects to follow mapping the information it has at 
each stage of a game to its associated action. It is a pure strategy if it is a deterministic 
rule. If the agent’s action is chosen by randomly sampling from a distribution, that 
distribution is known a mixed strategy. Note that an agent’s strategy concerns all 
possible sequences of provided information, even those that cannot arise due to the 
strategies of the other agents. 

Any multistage extensive-form game can be converted into a normal-form game, 
which is a single-stage game in which each agent is ignorant of the actions of the 
other agents, so that all agents choose their actions simultaneously. This conversion is 
achieved by having the “actions” of each agent in the normal-form game correspond 
to an entire strategy in the associated multistage extensive-form game. The payoffs to 
all the agents in the normal-form game for a particular strategy profile is then given 
by the associated payoff matrices of the multistage extensive-form games. 



Nash Equilibrium 

A solution to a game, or an equilibrium, is a profile in which every agent behaves 
“rationally.” This means that every agent’s choice of strategy maximizes its utility 
subject to a prespecified set of conditions. In conventional game theory those condi- 
tions involve, at a minimum, perfect knowledge of the payoff matrices of all other 
players and, often, specification of what strategies the other agents adopted. In partic- 
ular, a Nash equilibrium is a a profile where each agent has chosen the best strategy 

8 Although stochastic choices of actions are central to game theory, most of the work in the 
field assumes the information in information sets is in the form of definite facts, rather than 
a probability distribution. Accordingly, there has been relatively little work incorporating 
Shannon information theory into the analysis of information sets. 
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it can, given the choices of the other agents . A game may have no Nash equilibrium, 
one equilibrium, or many equilibria in the space of pure strategies. A beautiful and 
seminal theorem due to Nash proves that every game has at least one Nash equilib- 
rium in the space of mixed strategies [171]. 

There are several reasons one might expect a game to result in a Nash equi- 
librium. One is that it is the point that perfectly rational Bayesian agents would 
adopt, assuming the probability distributions they used to calculate expected pay- 
offs were consistent with one another [10, 124]. A related reason, arising even in a 
non-Bayesian setting, is that a Nash equilibrium provides “consistent” predictions, 
in that if all parties predict that the game will converge to a Nash equilibrium, no one 
will benefit by changing strategies. Having a consistent prediction does not ensure 
that all agents’ payoffs are maximized, however. The study of small perturbations 
around Nash equilibria from a stochastic dynamics perspective is just one example 
of a “refinement” of a Nash equilibrium, which provides a criterion for selecting a 
single equilibrium state when more than one is present [154]. 

Cooperative Game Theory 

In cooperative game theory the agents are able to enter binding contracts with one 
another and thereby coordinate their strategies. This allows the agents to avoid being 
“stuck” in Nash equilibria that are Pareto-inefficient, that is being stuck at equilib- 
rium profiles in which all agents would benefit if only they could agree to all adopt 
different strategies, with no possibility of betrayal. The characteristic function of a 
game involves subsets (‘coalitions’) of agents playing the game. For each such sub- 
set, it gives the sum of the payoffs of the agents in that subset that those agents can 
guarantee if they coordinate their strategies. An imputation is a division of such 
a guaranteed sum among the members of the coalition. It is often the case that for 
a subset of the agents in a coalition, one imputation dominates another, meaning 
that under threat of leaving the coalition that subset of agents can demand the first 
imputation rather than the second. So the problem each agent i is confronted with 
in a cooperative game is which set of other agents to form a coalition with, given 
the characteristic function of the game and the associated imputations i can demand 
of its partners. There are several kinds of solution for cooperative games that have 
received detailed study, varying in how the agents address this problem of who to 
form a coalition with. Some of the more popular are the “core,” the “Shapley value,” 
the “stable set solution,” and the “nucleolus.” 

In the real world, the actual underlying game the agents are playing does not 
only involve the actions considered in cooperative game theory’s analysis of coali- 
tions and imputations. The strategies of that underlying game also involve bargaining 
behavior, considerations of trying to cheat on a given contract, bluffing and threats, 
and the like. In many respects, by concentrating on solutions for coalition formation 
and their relation with the characteristic function, cooperative game theory abstracts 
away these details of the true underlying game. Conversely though, progress has 
recently been made in understanding how cooperative games can arise from nonco- 
operative games, as they must in the real world [11]. 
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Evolution and Learning in Games 

Not surprisingly, game theory has come to play a large role in the field of multiagent 
systems. In addition, due to Darwinian natural selection, one might expect game 
theory to be important in population biology, in which the “utility functions” of the 
individual agents can be taken to be their reproductive fitness. There is an entire 
subfield of game theory concerned with this connection with population biology, 
called “evolutionary game theory” [155, 157]. 

To introduce evolutionary game theory, consider a game in which all players 
share the same space of possible strategies and there is an additional space of pos- 
sible “attribute vectors” that characterize an agent, along with a probability distribu- 
tion g across that new space. (Examples of attributes in the physical world could be 
things like size and speed.) We select a set of agents to play a game by randomly sam- 
pling g. Those agents’ attribute vectors jointly determine the payoff matrices of each 
agent. (Intuitively, the benefit that accrues to an agent for taking a particular action 
depends on its attributes and those of the other agents.) However, each agent i has 
limited information concerning both its attribute vector and that of the other players 
in the game, information encapsulated in an “information structure.” The information 
structure specifies how much each agent knows about the game it is playing. 

In this context, we enlarge the meaning of the term “strategy” to not just a map- 
ping from information sets and the like to actions, but from entire information struc- 
tures to actions. In addition to the distribution g over attribute vectors, we have a 
distribution over strategies, h . A strategy 5 is a “population strategy” if h is a delta 
function about s. Intuitively, we have a population strategy when each animal in a 
population “follows the same behavioral rules,” rules that take as input what the an- 
imal is able to discern about its strengths and weaknesses relative to other members 
of the population and produce as output how the animal will act in the presence of 
such animals. 

Given g , a population strategy centered about s, and its own attribute vector, any 
player i in the support of g has an expected payoff for any strategy it might adopt. 
When V s payoff could not improve if it were to adopt any strategy other than 5, we 
say that 5 is “evolutionary-stable.” Intuitively, an evolutionary-stable strategy is one 
that is stable with respect to the introduction of mutants into the population. 

Now consider a sequence of such evolutionary games. Interpret the payoff that 
any agent receives after being involved in such a game as the “reproductive fitness” 
of that agent, in the biological sense. So the higher the payoff the agent receives, in 
comparison to the fitnesses of the other agents, the more “offspring” it has that get 
propagated to the next game. In the continuum time limit, where games are indexed 
by the real number t, this can be formalized by a differential equation that specifies 
the derivative of g t evaluated for each agent f s attribute vector, as a monotonically 
increasing function of the relative difference between the payoff of i and the aver- 
age payoff of all the agents. (We also have such an equation for h.) The resulting 
dynamics is known as “replicator dynamics,” with an evolutionary-stable population 
strategy, if it exists, being one particular fixed point of the dynamics. 
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Now consider removing the reproductive aspect of evolutionary game theory and 
instead have each agent propagate to the next game, with “memory” of the events of 
the preceding game. Furthermore, allow each agent to modify its strategy from one 
game to the next by “learning” from its memory of past games in a bounded rational 
manner. The field of learning in games is concerned with exactly such situations 
[12, 17,26,70,86, 126, 173, 178]. Most of the formal work in this field involves simple 
models for the learning process of the agents. For example, in “fictitious play” [86], 
in each successive game, each agent i adopts what would be its best strategy if its 
opponents chose their strategies according to the empirical frequency distribution 
of such strategies that i has encountered in the past. More sophisticated versions of 
this work use simple Bayesian learning algorithms, or reinventions of some of the 
techniques of the RL community [190]. Typically in learning in games one defines 
a payoff to the agent for a sequence of games, for example, as a discounted sum of 
the payoffs in each of the constituent games. Within this framework one can study 
the long-term effects of strategies such as cooperation and see if they arise naturally, 
and if so, under what circumstances. 

Many aspects of real-world games that do not occur very naturally otherwise 
arise spontaneously in these kinds of games. For example, when the number of games 
to be played is not prefixed, it may behoove a particular agent i to treat its opponent 
better than it would otherwise, because i may have to rely on that other agent’s 
treating it well in the future, if they end up playing each other again. This framework 
also allows us to investigate the dependence of evolving strategies on the amount 
of information available to the agents [159]; the effect of communication on the 
evolution of cooperation [160, 162]; and the parallels between auctions and economic 
theory [108,161]. 

In many respects, learning in games is even more relevant to the study of col- 
lectives than is traditional game theory. However, in general, it lacks a well-defined 
world utility and is almost exclusively focused on the forward problem, making it a 
difficult starting point for a field of collectives. 

2.3 Other Social Science-Inspired Systems 

Some human economies provides examples of naturally occurring systems that can 
be viewed as a (more or less) well-performing collective. However, the field of eco- 
nomics provides much more. Both empirical economics (e.g., economic history, 
experimental economics) and theoretical economics (e.g., general equilibrium the- 
ory [4], theory of optimal taxation [164]) provide a rich literature on strategic sit- 
uations where many parties interact. In fact, much of economics can be viewed as 
concerning how to maximize certain constrained kinds of world utilities, when there 
are certain (very strong) restrictions on the individual agents and their interactions, 
in particular when we have limited freedom in setting the utility functions of those 
agents. 
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Mechanism Design 

One way to try to induce a large collective to reach an equilibrium point without 
centralized control is via an auction . 9 (This is the approach usually used in compu- 
tational markets, as discussed later.) Along with optimal taxation and public good 
theory [137], the design of auctions is the subject of the field of mechanism design. 
Broadly defined, mechanism design is concerned with the incentives that must be ap- 
plied to any set of agents that interact and exchange goods [87, 164,229] in order to 
get those agents to exhibit desired behavior. Usually the desired behavior concerns 
prespecified “inherent” utility functions of some sort for each agent. In particular, 
mechanism design is often concerned with the incentives that must be superimposed 
on such inherent utility functions to guide the agents to a “(Pareto)-efficient” (or 
“Pareto-optimal”) point, that is, to a point in which no agent’s inherent utility can be 
improved without hurting another agent’s inherent utility [86, 87]. 

One particularly important type of such an incentive scheme is an auction. When 
many agents interact in a common environment there often needs to be a structure 
that supports the exchange of goods or information among those agents. Auctions 
provide one such (centralized) structure for managing exchanges of goods. For ex- 
ample, in the English auction all the agents come together and “bid” for a good, and 
the price of the good is increased until only one bidder remains, who gets the good 
in exchange for the resource bid. As another example, in the Dutch auction the price 
of a good is decreased until one buyer is willing to pay the current price. 

All auctions perform the same task: They match supply and demand. As such, 
auctions are one of the ways in which price equilibration among a set of interacting 
agents can be achieved. However, very few world utilities have their maximum occur 
at a point that is Pareto-optimal for the preset inherent utility functions. Accordingly, 
unless we are very fortunate in the relation between those inherent utility functions 
and the (in general, separately specified) world utility, knowing how to induce such 
a Pareto-optimal point is of little value. For example, in a transaction in an English 
auction both the seller and the buyer benefit. They may even have arrived at an allo- 
cation that is efficient. However, because the winner may have been willing to pay 
more for the good, such an outcome may confound the goal of the market designer, 
if that designer’s goal is to maximize revenue. This point is returned to later, in the 
context of computational economics. 

Another, perhaps more intuitive, perspective is to view the restrictions of mech- 
anism design as concerning the private utility functions of the individual agents. 
Typically in mechanism design the private utility function for each agent r 7 , which 
maps states of the entire world (including the internal state of the agent itself) to 1Z , 
is of the form ^(^, 1 ,^, 2 , •••,•%«, ^(y^i, 3 ^, 2 , •••, 3 fy,#n)), where y^(-) is agent 
rfs prefixed inherent utility function; the jc^i, jc^ 2 ? •••> constitute the first n of 

9 We do not discuss general equilibrium theory in detail here, because although it deals with 
the interaction among multiple markets to set the market “clearing” price for goods, it 
is not appropriate for the study of collectives: It requires centralized control (Walrasian 
auctioner), does not allow for dynamic interactions, and in general, there is no reason to 
believe that having the markets clear maximizes a world utility. 
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the n + k variables that that function depends on; and 7^(0 is the ^-valued “mech- 
anism” function the designer can set, the y^i , 3^, 2, ...» yrj,m being the variables mak- 
ing up its arguments. Unlike the private utility, world utility can depend on all of the 
x *7,1, ..., x rj^n, yr), i, 3^,2 > •••, yrj,m directly (as well as on other entirely different vari- 
ables). As an example, the y^i, 2, • ••, 3^,m could be a set of all agents’ bids at an 
auction; 7^(0 could be 7£ 2 -valued, giving the amount of change in rj’s owned quan- 
tities of both money and the item up for bid; and the x ^\ , ..., x r] ^ n could parameterize 
77 ’s happiness trade-off relating owned quantities of the good and of money. 

Typically (•) and the choice of what variables make up the arguments 1 , 3^,2, 

..., yrj,m to Trj are fixed a priori, with only the function 7^(-) allowed to vary in the 
design. In addition, often there are a priori restrictions on the functional form of the 
T n . For example, often the T r] are not allowed to vary with rj. More precisely, usu- 
ally they must be invariant under the transformation rj -* rj f in both the index to the 
function and the indices to its arguments. This means, in particular, that the designer 
can’t “cheat” and have the functional forms of the T n vary from one rj to another in 
a way that reflects the variations across the (often predetermined) associated vectors 
(jc^i, ..., Xrj^n). For example, typically an auction mechanism determines who gets 
what goods for what price in a manner that is independent of the identities of the bid- 
ding agents and does not directly reflect any internal happiness trade-off parameters 
of the agents that aren’t reflected in their bids. 

From the perspective of a collective, these kinds of restrictions on private utilities 
only hold in a small subset of the potential computational problems and constitute a 
severe handicap in other scenarios. Another limitation of most of the work on mech- 
anism design is that either it assumes a particular computational model for the agent 
or (more commonly) it focuses on (game-theoretic) equilibria. This limited nature of 
the treatment of off-equilibrium scenarios is intimately related to the restrictions on 
the form of the private utility. If there are no restrictions on the private utilities, then 
there is a trivial solution for how to set such utilities to maximize the world utility 
at equilibrium: Have each such utility simply equal the world utility, in a so-called 
“team game.” To have the analysis be nontrivial, restrictions like those on the private 
utilities are needed. 

In practice though, no real system is at a game-theoretic equilibrium, due to 
bounded rationality. In particular, it means that if one considers mechanism design 
in the limiting case of no restrictions on y(-), the associated “mechanism design 
solution” of a team game often will result in poor performance [238]. Team the- 
ory [105, 153] is one approach that has been tried to circumvent this problem. The 
idea there is to remove all notions of a private or inherent utility and solve directly 
for the strategy profile that will maximize the world utility. Needless to say, though, 
such an approach becomes extraordinarily difficult for all but the simplest problems 
and requires centralized, completely personalized control and communication and 
exact modeling of the system’s dynamics. 
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Computational Economics 

“Computational economies” are schemes inspired by economics, and more specifi- 
cally, by general equilibrium theory and mechanism design theory, for managing the 
components of a distributed computational system. They work by having a “compu- 
tational market,” akin to an auction, guide the interactions among those components. 
Such a market is defined as any structure that allows the components of the system to 
exchange information on relative valuation of resources (as in an auction), establish 
equilibrium states (e.g., determine market clearing prices), and exchange resources 
(i.e., engage in trades). 

Such computational economies can be used to investigate real economies and 
biological systems [31,34,35, 128]. They can also be used to design distributed com- 
putational systems. For example, such computational economies are well-suited to 
some distributed resource allocation problems, where each component of the system 
can either directly produce the “goods” it needs or acquire them through trades with 
other components. Computational markets often allow for far more heterogeneity 
in the components than do conventional resource allocation schemes. Furthermore, 
there is both theoretical and empirical evidence suggesting that such markets are of- 
ten able to settle to equilibrium states. For example, auctions find prices that satisfy 
both the seller and the buyer, which results in an increase in the utility of both (else 
one or the other would not have agreed to the sale). Assuming that all parties are 
free to pursue trading opportunities, such mechanisms move the system to a point 
where all possible bilateral trades that could improve the utility of both parties are 
exhausted. 

Now restrict attention to the case, implicit in much of computational market 
work, with the following characteristics: First, world utility can be expressed as a 
monotonically increasing function F, where each argument / of F can in turn be in- 
terpreted as the value of a prespecified utility function /• for agent /. Second, each of 
those /sis a function of an /-indexed “goods vector” of the nonperishable goods 
“owned” by agent /. The components of that vector are Xij, and the overall system 
dynamics is restricted to conserve the vector Xij. (There are also some other, 
more technical conditions.) As an example, the resource allocation problem can be 
viewed as concerning such vectors of “owned” goods. 

Due to the second of our two conditions, one can integrate a market-clearing 
mechanism into any system of this sort. Due to the first condition, because in a market 
equilibrium with nonperishable goods no (rational) agent ends up with a value of 
its utility function lower than the one it started with, the value of the world utility 
function must be higher at equilibrium than it was initially. In fact, so long as the 
individual agents are smart enough to avoid all trades in which they do not benefit, 
any computational market can only improve this kind of world utility, even if it does 
not achieve the market equilibrium. 

This line of reasoning provides one of the main reasons to use computational 
markets in those situations in which they can be applied. Conversely, it underscores 
one of the major limitations of such markets: Starting with an arbitrary world utility 
function with arbitrary dynamical restrictions, it may be quite difficult to cast that 




18 



K. Turner and D. Wolpert 



function as a monotonically increasing F taking as arguments a set of agents’ goods- 
vector-based utilities ft , if we require that those ft be well-enough behaved that we 
can reasonably expect the agents to optimize them in a market setting. 

One example of a computational economy being used for resource allocation is 
Huberman and Clearwater’s use of a double-blind auction to solve the complex task 
of controlling the temperature of a building. In this case, each agent (individual tem- 
perature controller) bids to buy or sell cool or warm air. This market mechanism 
leads to an equitable temperature distribution in the system [116]. Other domains 
where market mechanisms were successfully applied include purchasing memory 
in an operating system [50], allocating virtual circuits [75], “stealing” unused CPU 
cycles in a network of computers [69, 230], predicting option futures in financial 
markets [185], and numerous scheduling and distributed resource allocation prob- 
lems [138,142,210,218,234,235]. 

Computational economics can also be used for tasks not tightly coupled to re- 
source allocation. For example, following the work of Maes [151] and Ferber [74], 
Baum shows how by using computational markets many agents can interact and co- 
operate to solve a variant of the blocks world problem [22, 23]. However, market- 
based computational economics relies on both centralized communication and cen- 
tralized control to some degree, raising scalability issues. Furthermore, in practice, 
the applicability of computational economies depends greatly on the domain [225], 
making it a difficult starting point for a field of collectives. 



2.4 Biologically Inspired Systems 

Properly speaking, biological systems do not involve utility functions and search 
across them with learning algorithms. However, it has long been appreciated that 
there are many ways in which viewing biological systems as involving searches over 
such functions can lead to a deeper understanding of them [203,244]. Conversely, 
some have argued that the mechanism underlying biological systems can be used to 
help design search algorithms [109]. 10 

These kinds of reasoning, which relate utility functions and biological systems 
have traditionally focused on the case of a single biological system operating in some 
external environment. If we extend this kind of reasoning to a set of biological sys- 
tems that are coevolving with one another, then we have essentially arrived at bio- 
logically based collectives. This section discusses how some previous work in the 
literature bears on this relationship between collectives and biology. 

Population Biology and Ecological Modeling 

The fields of population biology and ecological modeling are concerned with the 
large-scale “emergent” processes that govern the systems that consist of many (rel- 
atively) simple entities interacting with one another [24,99]. As usually cast, the 

10 See [150,236] for some counterarguments to the particular claims most commonly made 
in this regard. 
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“simple entities” are members of one or more species, and the interactions are a 
mathematical abstraction of the process of natural selection as it occurs in biological 
systems (involving processes like genetic reproduction of various sorts, genotype- 
phenotype mappings, inter- and intra-species competitions for resources, etc.). Pop- 
ulation biology and ecological modeling in this context address questions concern- 
ing the dynamics of the resultant ecosystem, in particular how its long-term behavior 
depends on the details of the interactions between the constituent entities. Broadly 
construed, the paradigm of ecological modeling can even be broadened to study how 
natural selection and self-regulating feedback create a stable planetwide ecological 
environment known as Gaia [ 1 44] . 

The underlying mathematical models of other fields can often be usefully modi- 
fied to apply to the kinds of systems in which population biology is interested [14]. 
(See also the discussion in the earlier section on game theory.) Conversely, the un- 
derlying mathematical models of population biology and ecological modeling can 
be applied to other nonbiological systems. In particular, those models shed light on 
social issues such as the emergence of language or culture, warfare, and economic 
competition [71,72, 88]. They also can be used to investigate more abstract issues 
concerning the behavior of large complex systems with many interacting compo- 
nents [89,98,156,176,184]. 

Going a bit further afield, an approach that is related in spirit to ecological mod- 
eling is “computational ecologies.” These are large distributed systems where each 
component of the system acts (seemingly) independently, resulting in complex global 
behavior. Those components are viewed as constituting an “ecology” in an abstract 
sense (although much of the mathematics is not derived from the traditional field 
of ecological modeling). In particular, one can investigate how the dynamics of the 
ecology is influenced by the information available to each component and how coop- 
eration and communication among the components affect that dynamics [1 15, 1 17]. 

Although in some ways the most closely related to collectives of the current 
ecology-inspired research, the fields of population biology and computational ecolo- 
gies do not provide a full science of collectives. These fields are primarily concerned 
with the “forward problem” of determining the dynamics that arises from certain 
choices of the underlying system. Unless one’s desired dynamics is sufficiently close 
to some dynamics that was previously catalogued (during one’s investigation of the 
forward problem), one has very little information on how to set up the components 
and their interactions to achieve that desired dynamics. 



Swarm Intelligence 

The field of “swarm intelligence” is concerned with systems that are modeled af- 
ter social insect colonies, so that the different components of the system are queen, 
worker, soldier, etc. It can be viewed as ecological modeling in which the individ- 
ual entities have extremely limited computing capacity or action sets and in which 
there are very few types of entities. The premise of the field is that the rich behavior 
of social insect colonies arises not from the sophistication of any individual entity 
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in the colony, but from the interaction among those entities. The objective of cur- 
rent research is to uncover kinds of interactions among the entity types that lead to 
prespecified behavior of some sort. 

More speculatively, the study of social insect colonies may also provide insight 
into how to achieve learning in large distributed systems. This is because at the level 
of the individual insect in a colony, very little (or no) learning takes place. However, 
across evolutionary time scales, the social insect species as a whole functions as if 
the various individual types in a colony had “learned” their specific functions. The 
“learning” is the direct result of natural selection. (See the discussion on this topic in 
the section on ecological modeling.) 

Swarm intelligences have been used to adaptively allocate tasks [33, 136], solve 
the traveling salesman problem [62, 63] and route data efficiently in dynamic net- 
works [32,201,219]. However, there is no general framework for adapting swarm 
intelligences to maximize particular world utility functions. Accordingly, such intel- 
ligences generally need to be hand-tailored for each application. 

2.5 Physics-Based Systems 
Statistical Physics 

Equilibrium statistical physics is concerned with the stable-state character of large 
numbers of very simple physical objects, interacting according to well-specified local 
deterministic laws, with probabilistic noise processes superimposed [6, 188]. Typi- 
cally, there is no sense in which such systems can be said to have centralized control, 
because all particles contribute comparably to the overall dynamics. 

Aside from mesoscopic statistical physics, the numbers of particles considered 
are usually huge (e.g., 10 23 ), and the particles themselves are extraordinarily sim- 
ple, typically having only a few degrees of freedom. Moreover, the noise processes 
usually considered are highly restricted, formed by “baths” of heat, particles, and 
the like. Similarly, almost all of the field restricts itself to deterministic laws that are 
readily encapsulated in Hamilton’s equations (Schrodinger’s equation and its field- 
theoretic variants for quantum statistical physics). In fact, much of equilibrium sta- 
tistical physics isn’t even concerned with the dynamic laws by themselves (as, for 
example, stochastic Markov processes are). Rather, it is concerned with invariants of 
those laws (e.g., energy), invariants that relate the states of all of the particles. Deter- 
ministic laws without such readily discoverable invariants are outside the purview of 
much of statistical physics. 

One potential use of statistical physics for collectives involves taking the systems 
that statistical physics analyzes, especially those analyzed in its condensed matter 
variant (e.g., spin glasses [213,214]), as simplified models of a class of collectives. 
This approach is used in some of the analyses of the El Farol Bar problem, also 
called the minority game (discussed later) [5,48]. It is used more overtly in (for 
example) the work of Galam [90], in which the equilibrium coalitions of a set of 
“countries” are modeled in terms of spin glasses. This approach cannot provide a 
general collectives framework, however. This is due to its not providing a general 




1 . A Survey of Collectives 



21 



solution to arbitrary collectives inversion problems, being only concerned with the 
kinds of systems discussed earlier, and to its not using RL algorithms. 11 

Another contribution that statistical physics can make is with the mathematical 
techniques it has developed for its own purposes, such as mean field theory, self- 
averaging approximations, phase transitions, Monte Carlo techniques, the replica 
trick, and tools to analyze the thermodynamic limit in which the number of particles 
goes to infinity. Although such techniques have not yet been applied to collectives, 12 
and they have been successfully applied to related fields. This is exemplified by the 
use of the replica trick to analyze two-player zero-sum games with random payoff 
matrices in the thermodynamic limit of the number of strategies in [27]. Other ex- 
amples are the numeric investigation of the iterated prisoner’s dilemma played on a 
lattice [223], the analysis of stochastic games by expressing deviation from rational- 
ity in the form of a “heat bath” [154], and the use of topological entropy to quantify 
the complexity of a voting system studied in [158]. 

Other recent work in the statistical physics literature is formally identical to that 
in other fields but has a novel perspective. A good example of this is [21 1], which is 
concerned with the problem of controlling a spatially extended system with a single 
controller using an algorithm identical to a simple-minded proportional RL algorithm 
(in essence, a rediscovery of RL). 

Action Extremization 

Much of the theory of physics can be cast as solving for the extremization of an 
actional, which is a functional of the worldline of an entire (potentially many- 
component) system across all time. The solution to that extremization problem con- 
stitutes the actual worldline followed by the system. In this way the calculus of vari- 
ations can be used to solve for the worldline of a dynamic system. As an example, 
simple Newtonian dynamics can be cast as solving for the worldline of the system 
that extremizes a quantity called the “Lagrangian,” which is a function of that world- 
line and certain parameters (e.g., the “potential energy”) governing the system at 
hand. In this instance, the calculus of variations simply results in Newton’s laws. 

If we take the dynamic system to be a collective, we are assured that its worldline 
automatically maximizes a “world utility” consisting of the value of the associated 
actional. If we change physical aspects of the system that determine the functional 
form of the actional (e.g., change the system’s potential energy function), then we 
change the world utility, and we are assured that our collective maximizes that new 
world utility. Counterintuitive physical systems, like the strings-and-springs systems 

1 1 In regard to the latter point, however, it’s interesting to speculate about recasting statistical 
physics as a collective, by viewing each of the particles in the physical system as running 
an “RL algorithm” that perfectly maximizes the “utility function” of its Lagrangian, given 
the “actions” of the other particles. In this perspective, many-particle physical systems are 
multistage games that are at Nash equilibrium in each stage. So, for example, a frustrated 
spin glass is such a system at a Nash equilibrium that is not Pareto-optimal. 

12 Preliminary results in combining Monte Carlo techniques with collectives has yielded im- 
provements of several orders of magnitude over traditional Monte Carlo techniques [240]. 
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that exhibit Braess’ paradox [20], are simply systems for which the world utility 
implicit in our human intuition is extremized at a point different from the one that 
extremizes the system’s actional. 

The challenge in exploiting this to solve the design of collectives problem is in 
translating an arbitrary provided global goal for the collective into a parameterized 
actional. Note that that actional must govern the dynamics of the collective, and 
the parameters of the actional must be physical variables in the collective, variables 
whose values we can modify. 

Active Walker Models 

The field of active walker models [21, 100, 101] is concerned with modeling “walk- 
ers” (be they human walkers or simple physical objects) crossing fields along trajec- 
tories, where those trajectories are a function of several factors, including the trails 
already worn into the field. Often the kind of trajectories considered are those that 
can be cast as solutions to actional extremization problems so that the walkers can 
be explicitly viewed as agents maximizing a private utility. 

One of the primary concerns with the field of active walker models is how the 
trails worn in the field change with time to reach a final equilibrium state. The prob- 
lem of how to design the cement pathways in the field (and other physical features 
of the field) so that the final paths actually followed by the walkers will have certain 
desirable characteristics is then one of solving for parameters of the actional that will 
result in the desired worldline. This is a special instance of the inverse problem of 
how to design a collective. 

Using active walker models this way to design collectives, like action extrem- 
ization in general, probably has limited applicability. It is also not clear how robust 
such a design approach might be or whether it would be scalable and exempt from 
the need for hand-tailoring. 

2.6 Other Related Subjects 

This section presents a “catch-all” of other fields that have little in common with one 
another and, while either still nascent or not extremely closely related to collectives, 
bear some relation to collectives. 

Stochastic Fields 

An extremely well-researched body of work concerns the mathematical and nu- 
meric behavior of systems for which the probability distribution over possible fu- 
ture states conditioned on preceding states is explicitly provided. This work involves 
many aspects of Monte Carlo numerical algorithms [172], all aspects of Markov 
chains [80, 177, 215] and especially of Markov fields, a topic that encompasses the 
Chapman-Kolmogorov equations [91] and their variants: Liouville’s equation, the 
Fokker-Plank equation, and the detailed-balance equation. Nonlinear dynamics is 
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also related to this body of work (see the synopsis of iterated function systems and 
the synopsis of cellular automata later), as is Markov competitive decision processes 
(see the earlier synopsis of game theory). 

Formally, one can cast the problem of designing a collective as fixing each of 
the conditional transition probability distributions of the individual elements of a 
stochastic field so that the aggregate behavior of the overall system is a desired 
form. 13 

Amorphous Computing and Control of Smart Matter 

Amorphous computing grew out of the idea of replacing traditional computer design, 
with its requirements for high reliability of the components of the computer, with a 
novel approach in which widespread unreliability of those components would not 
interfere with the computation [1,2]. Some of its more speculative aspects are con- 
cerned with “how to program” a massively distributed, noisy system of components 
that may consist in part of biochemical or biomechanical components [131, 233]. 
Work here has tended to focus on schemes for how to robustly induce desired ge- 
ometric dynamics across the physical body of the amorphous computer — an issue 
closely related to morphogenesis — and thereby lend credence to the idea that bio- 
chemical components are a promising approach. 

Especially in its limit of computers with very small constituent components, 
amorphous computing is closely related to the fields of nanotechnology [64]. As 
the prospect of nanotechnology-driven mechanical systems gets more concrete, 
the daunting problem of how to robustly control, power, and sustain protean sys- 
tems made up of extremely large sets of nano-scale devices looms more impor- 
tant [95, 96, 107]. If this problem were to be solved, one would in essence have 
“smart matter.” For example, one would be able to “paint” an airplane wing with 
such matter and have it improve drag and lift properties significantly. 

Self-Organizing Systems 

The concepts of self-organization and self-organized criticality [15] were originally 
developed to help understand why many distributed physical systems are attracted to 
critical states that possess long-range dynamic correlations in the large-scale char- 
acteristics of the system. It provides a powerful framework for analyzing both bio- 
logical and economic systems. For example, natural selection (particularly punctu- 
ated equilibrium [68, 93]) can be likened to self-organizing dynamical system, and 

13 In contrast, in the field of Markov decision processes discussed in [45], the full system may 
be a Markov field, but the system designer only sets the conditional transition probability 
distribution of at most a few of the field elements, to the appropriate “decision rules.” 
Unfortunately, it is hard to imagine how to use the results of this field to design collectives 
because of major scaling problems. Any decision process must accurately model likely 
future modifications to its own behavior — often an extremely daunting task [150]. What’s 
worse, if multiple such decision processes are running concurrently in the system, each 
such process must also model the others, potentially in their full complexity. 
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some have argued that it shares many properties (e.g., scale invariance) of such sys- 
tems [57]. Similarly, one can view the economic order that results from the actions 
of human agents as a case of self-organization [59]. The relationship between com- 
plexity and self-organization is a particularly important one, in that it provides the 
potential laws that allow order to arise from chaos [125]. 

Adaptive Control Theory 

Adaptive control [7, 196], and in particular adaptive control involving locally weighted 
RL algorithms [9, 165], constitute a broadly applicable framework for controlling 
small, potentially inexactly modeled systems. Augmented by techniques in the con- 
trol of chaotic systems [52,60,61], they constitute a very successful way of solving 
the “inverse problem” for such systems. Unfortunately, it is not clear how one could 
even attempt to scale such techniques up to the massively distributed systems of 
interest in collectives. 



3 COIN Framework 

The previous section provided a summary of different fields that address various 
issues pertinent to the field of collectives. In this section, we summarize the COIN 
(collective intelligence) framework, which is one of the first frameworks that aims to 
bridge the gap between the needs of the field of collectives and the extant analysis 
and design methods. 14 

3.1 Central Equation 

Let Z be an arbitrary vector space whose elements z give the joint move of all agents 
in the system (i.e., z specifies the full “worldline” consisting of the actions and states 
of all the agents). The world utility G(z), is a function of the full worldline, and we 
are concerned with the problem of finding the z that maximizes G(z). 

In addition to G, for each agent rj, there is a private utility function {g^}. The 
agents act to improve their individual private utility functions, even though we, as 
system designers, are only concerned with the value of the world utility G. To specify 
all agents other than r /, we will use the notation^. 

Our uncertainty concerning the behavior of the system is reflected in a probability 
distribution over Z. Our ability to control the system consists of setting the value 
of some characteristic of the agents, e.g., setting file private functions of the agents. 
Indicating that value by 5, our analysis revolves around the following central equation 
for P(G | s), which follows from Bayes’ theorem: 

P{G I s) = j de G P(G I € G ,s) j d€ g P(€ G I € g ,s)P(€ g I s) , (1) 

14 The full COIN theory is presented in Chapter 2. 
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where € g is the vector of the “intelligences” of the agents with respect to their asso- 
ciated private functions and €g is the vector of the intelligences of the agents with 
respect to G. Intuitively, these vectors indicate what percentage of 77’ s actions would 
have resulted in lower utility. 15 In this chapter, we use intelligence vectors as decom- 
position variables for Equation 1 . 

Note that e gr] ( z ) = 1 means that player 77 is fully rational at z, in that its move 
maximizes the value of its utility given the moves of the players. In other words, a 
point z where € gr] (z) = 1 for all players 77 is one that meets the definition of a game- 
theory Nash equilibrium. On the other hand, a z at which all components of 6 g = 1 
is a local maximum of G (or, more precisely, a critical point of the G(z ) surface). 
If we can get these two vectors to be identical, then if the agents do well enough at 
maximizing their private utilities we are assured to be near a local maximum of G. 

To formalize this, consider our decomposition of P(G \ s ). If we can choose s 
so that the third conditional probability in the integrand, P(€ g | s), is peaked around 
vectors e g9 all of whose components are close to 1 (that is, agents are able to “learn” 
their tasks), then we have likely induced large private utility intelligences (this issue 
is traditially addressed in the field of machine learning). If we can also have the 
second term, P(zg I e^s), be peaked about €g equal to e g (that is, the private 
and world utilities are aligned), then €g will also be large (this issue is traditionally 
addressed in mechanism design). Finally, if the first term in the integrand, P(G \ 
^g,s), is peaked about high G when €g is large, then our choice of s will likely 
result in high G, as desired (this issue arises in fields such as operations research and 
evolutionary programming). 

3.2 Factoredness and Learnability 

For high values of G to be achieved in a collective, the private utility functions of the 
agents need to satisfy two properties (i.e., have good form for the second and third 
term of Equation l). 16 First, the private utility functions need to be “aligned with G,” 
a need expressed in the second term of Equation 1 . In particular, regardless of the 
details of the stochastic environment in which the agents operate, or of the details of 
the learning algorithms of the agents, if e g equals ec exactly for all z, the desired 
form for the second term in Equation 1 is assured. For such systems, we have: 

grj(z) > grj(z) O G(Z) > G(z') V Z, Z S.t. Zy = Zy 

Intuitively, for all pairs of states z and z ' that differ only for agent 77, a change in 77 ’s 
state that increases its private utility cannot decrease the world utility. We call such a 

15 Intelligence is formally defined in Chapter 2. 

16 Nongame theory-based function maximization techniques like simulated annealing instead 
address how to have term 1 have the desired form. They do this by trying to ensure that the 
local maxima that the underlying system ultimately settles near have high G, by “trading 
off exploration and exploitation.” One can combine such term- 1 -based techniques with 
the techniques presented here. The resultant hybrid algorithm, addressing all three terms, 
outperforms simulated annealing by more than two orders of magnitude [240]. 
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system factored. In game theory language, the private utility function Nash equilib- 
ria of a factored system are local maxima of G. In addition to this desirable equilib- 
rium behavior, factored systems automatically provide appropriate off-equilibrium 
incentives to the agents (an issue generally not considered in the game theory and 
mechanism design literature). 

Second, we want the agents’ private utility functions to have high learnability, 
intuitively meaning that an agent’s utility should be sensitive to its own actions and 
insensitive to the actions of others. This requirement that private utility functions 
have high “signal-to-noise” ratios arises in the third term. As an example, consider a 
“team game” where the private utility functions are set to G [56]. Such a system is 
tautologically factored. However, team games often have low learnability, because in 
a large system an agent will have a difficult time discerning the effects of its actions 
on G. As a consequence, each 77 may have difficulty achieving high g r] in such a 
system. Loosely speaking, agent 77’ s learnability is the ratio of the sensitivity of g^ to 
77’ s actions to the sensitivity g^ to the actions of all other agents. So at a given state z, 
the higher the learnability, the more g v (z) depends on the move of agent 77, i.e., the 
better the associated signal-to-noise ratio for 77. Intuitively then, higher learnability 
means it is easier for 77 to achieve a large value of its utility. 

3.3 Difference Utilities 

It is possible to solve for the set of all private utilities that are factored with respect to 
a particular world utility. Unfortunately, in general it is not possible for a collective 
both to be factored and to have perfect learnability for all of its players (i.e., no 
dependence of any g^ on any agent other than 77) for all of its agents [238]. However, 
consider difference utilities, which are of the form: 

DU(z) = G(z) - T(/(z)), (2) 

where T(/) is independent of Such difference utilities are factored [238]. In 
addition, under usually benign approximations, learnability is maximized over the 
set of difference utilities by choosing 

r(/(z)) = £(G |z v s) (3) 

up to an overall additive constant. We call the resultant difference utility the Aris- 
tocrat utility (AU). If each player 77 uses an appropriately rescaled version of the 
associated AU as its private utility function, then we have ensured good form for 
both the second and third terms in Equation 1 . 

Using AU in practice is sometimes difficult, due to the need to evaluate the ex- 
pectation value. Fortunately, there are other utility functions that, while easier to 
evaluate than AU, still are both factored and possess superior learnability to the team 
game utility, g^ = G. One such private utility function is the Wonderful Life utility 
(WLU). The WLU for player 77 is parameterized by a prefixed clamping parameter 
CLjj chosen from among 77’s possible moves: 
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Figure 1. This example shows the impact of the clamping operation on the joint state of a four- 
agent system where each agent has three possible actions and each such action is represented 
by a three-dimensional unary vector. The first matrix represents the joint state of the system z 
where agent 1 has selected action 1, agent 2 has selected action 3, agent 3 has selected action 
1, and agent 4 has selected action 2. The second matrix displays the effect of clamping agent 
2’s action to the “null” vector (i.e., replacing Zrj 2 with 0). The third matrix shows the effect 
of instead clamping agent 2’s move to the “average” action vector a = {.33, .33, .33}, which 
amounts to replacing that agent’s move with the “illegal” move of fractionally taking each 
possible move {z m = a)- 



WLUrj = G(z) - G(z v CL^) . (4) 

WLU is factored no matter what the choice of clamping parameter. Furthermore, 
while not matching the high leamability of AU, WLU usually has much better leam- 
ability than does a team game, because most of the “noise” due to other agents is 
removed from 77 ’ s utility. Therefore, WLU generally results in better performance 
than does team game utilities [228,238]. 

Figure 1 provides an example of clamping. As in that example, in many cir- 
cumstances there is a particular choice of clamping parameter for agent 77 that is 
a “null” move for that agent, equivalent to removing that agent from the system. 
For such a clamping parameter WLU is closely related to the economics technique 
of “endogenizing a player’s (agent’s) externalities,” for example, with the Groves 
mechanism [87, 174, 175]. 

However, it is usually the case that using WLU with a clamping parameter that 
is as close as possible to the expected move defining AU results in much higher 
leamability than does clamping to the null move. Such a WLU is roughly akin to 
a mean-field approximation to AU . 17 For example, in Figure 1, if the probabilities 
of player 2 making each of its possible moves was 1/3, then one would expect that 

17 Formally, our approximation is exact only if the expected value of G equals G evaluated 
at the expected joint move (both expectations being conditioned on given moves by all 
players other than 77 ). In general, however, for relatively smooth G, we would expect such 
a mean-field approximation to AU, to give good results, even if the approximation does not 
hold exactly. 
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a clamping parameter of a would be close to optimal. Accordingly, in practice, use 
of such an alternative WLU derived as a “mean-field approximation” to AU almost 
always results in better values of G than does the “endogenizing” WLU. 

Intuitively, collectives having factored and highly learnable private utilities like 
AU and WLU can be viewed as akin to well-run human companies. G is the “bot- 
tom line” of the company, the players r] are identified with the employees of that 
company, and the associated g ^ given by the employees’ performance-based com- 
pensation packages. For example, for a “factored company,” each employee’s com- 
pensation package contains incentives designed such that the better the bottom line of 
the corporation, the greater the employee’s compensation. As an example, the CEO 
of a company wishing to have the private utilities of the employees be factored with 
G may give stock options to the employees. The net effect of this action is to ensure 
that what is good for the employee is also good for the company. In addition, if the 
compensation packages are “highly learnable,” the employees will have a relatively 
easy time discerning the relationship between their behavior and their compensation. 
In such a case, the employees will both have the incentive to help the company and 
be able to determine how best to do so. Note that in practice, providing stock options 
is usually more effective in small companies than in large ones. This makes perfect 
sense in terms of the formalism summarized earlier because such options generally 
have higher leamability in small companies than in large companies, in which each 
employee has a hard time seeing how his or her moves affect the company’s stock 
price. 

3.4 Summary of COIN Results to Date 

In earlier work, we tested the WLU for distributed control of network packet rout- 
ing [241], achieving substantially better throughput than by using the best possible 
shortest-path-based system [241], even though that SPA-based system has informa- 
tion denied the agents in the WLU-based collective. In related work we have shown 
that use of the WLU automatically avoids the infamous Braess’ paradox, in which 
adding new links can actually decrease throughput, a situation that readily ensnares 
SPAs [228,239]. 

We have also applied the WLU to the problem of controlling communication 
across a constellation of satellites to minimize the importance- weighted loss of sci- 
entific data flowing across that constellation [237]. We have also shown that agents 
using utility functions derived from the COIN framework significantly improve per- 
formance in the problem of job scheduling across a heterogeneous computing grid 
[227]. 

In addition, we have explored COIN-based techniques on variants of congestion 
games [238, 242, 243], in particular of a more challenging variant of Arthur’s El 
Farol Bar attendance problem [5] (also known as the “minority game” [48]). In this 
work we showed that use of the WLU can result in performance orders of magni- 
tude superior to that of team game utilities. We have also successfully applied COIN 
techniques to the problem of coordinating a set of autonomous rovers to maximize 
the importance- weighted value of a set of locations they visit [226]. 
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Finally, we have also explored applying COIN techniques to problems that are 
explicitly cast as search. These include setting the states of the spins in a spin glass to 
minimize energy; the conventional bin-packing problem of computer science, and a 
model of human agents connected in a small- world network who have to synchronize 
their purchase decisions [240]. 



4 Applications and Problems Driving Collectives 

The previous sections focused on fields that provide solutions to problems arising 
in the field of collectives. To complement them, in this section we present three 
problems that are particularly suited to being approached from the field of collectives 
and that provide fertile ground for testing novel theories of collectives. 

4.1 El Farol Bar Problem (Minority Game) 

The “El Farol” Bar problem (also known as the minority game) and its variants pro- 
vide a clean and simple testbed for investigating certain kinds of interactions among 
agents [5, 44, 47, 206]. In the original version of the problem, which arose in eco- 
nomics, at each time step (each “night”), each agent needs to decide whether to 
attend a particular bar. The goal of the agent in making this decision depends on the 
total attendance at the bar on that night. If the total attendance is below a preset ca- 
pacity then the agent should have attended. Conversely, if the bar is overcrowded on 
the given night, then the agent should not attend. (Because of this structure, the bar 
problem with capacity set to 50% of the total number of agents is also known as the 
“minority game”; each agent selects one of two groups at each time step, and those 
that are in the minority have made the right choice). The agents make their choices 
by predicting ahead of time whether the attendance on the current night will exceed 
the capacity and then taking the appropriate course of action. 

What makes this problem particularly interesting is that it is impossible for each 
agent to be perfectly “rational,” in the sense of correctly predicting the attendance 
on any given night. This is because if most agents predict that the attendance will be 
low (and therefore decide to attend), the attendance will actually be high, and if they 
predict the attendance will be high (and therefore decide not to attend) the attendance 
will be low. (In the language of game theory, this essentially amounts to the property 
that there are no pure-strategy Nash equilibria [49, 246].) Alternatively, viewing the 
overall system as a collective, it has a prisoner’s dilemma-like nature, in that “ratio- 
nal” behavior by all the individual agents thwarts the global goal of maximizing total 
enjoyment (defined as the sum of all agents’ enjoyment and maximized when the bar 
is exactly at capacity). 

This frustration effect is a crisp example of the difficulty that can arise when 
agents try to model agents that are in their turn modeling the first agents. It is similar 
to what occurs in spin glasses in physics, and it makes the bar problem closely related 
to the physics of emergent behavior in distributed systems [46^-8,248]. Researchers 
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have also studied the dynamics of the bar problem to investigate economic proper- 
ties like competition, cooperation, and collective behavior and their relationship to 
market efficiency [58, 122, 197]. 

4.2 Data Routing in a Network 

Packet routing in a data network [28,94, 110, 127,212,231] presents a particularly 
interesting domain for the investigation of collectives. In particular, with such rout- 
ing: 

1 . the problem is inherently distributed; 

2. for all but the most trivial networks it is impossible to employ global control; 

3. the routers only have access to local information (routing tables); 

4. it constitutes a relatively clean and easily modified experimental testbed; and 

5. there are potentially major bottlenecks induced by “greedy” behavior on the part 
of the individual routers, where the behavior constitutes a readily investigated 
instance of the tragedy of the commons (TOC). 

Many of the approaches to packet routing incorporate a variant on RL [39,43, 
51, 147, 152]. Q-routing is perhaps the best known such approach and is based on 
routers using reinforcement learning to select the best path [39]. Although generally 
successful, Q-routing is not a general scheme for inverting a global task. This is 
true even if one restricts attention to the problem of routing in data networks; there 
exists a global task in such problems, but that task is directly used to construct the 
algorithm. 

A particular version of the general packet-routing problem that is acquiring in- 
creased attention is the quality of service (QoS) problem, where different com- 
munication packets (voice, video, data) share the same bandwidth resource but 
have widely varying importances to both the user and (via revenue) the bandwidth 
provider. Determining which packet has precedence over other packets in such cases 
is not only based on priority in arrival time but more generally on the potential effects 
on the income of the bandwidth provider. In this context, RL algorithms have been 
used to determine routing policy, control call admission, and maximize revenue by 
allocating the available bandwidth efficiently [43, 152]. 

Many researchers have exploited the noncooperative game-theoretic understand- 
ing of the TOC to explain the bottleneck character of empirical data networks’ be- 
havior and suggest potential alternatives to current routing schemes [25,67, 132, 
133, 139, 141, 179, 180,208]. Closely related is work on various “pricing”-based 
resource-allocation strategies in congestable data networks [149]. This work is at 
least partially based on current understanding of pricing in toll lanes and traffic flow 
in general. All of these approaches are of particular interest when combined with 
the RL-based schemes mentioned earlier. Due to these factors, much of the current 
research on a general framework for collectives is directed toward the packet-routing 
domain (see the next section). 
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4.3 Traffic Theory 

Traffic congestion typifies the tragedy of the commons public good problem: Every- 
one wants to use the same resource, and all parties greedily try to maximize their 
use of that resource worsens global behavior and their own private utility (e.g., if 
everyone disobeys traffic lights, everyone gets stuck in traffic jams). Indeed, in the 
well-known Braess’ paradox [20,54,55, 134], keeping everything else constant — 
including the number and destinations of drivers — but opening a new traffic path 
can increase everyone’s time to get to their destination. (Viewing the overall system 
as an instance of the prisoner’s dilemma, this paradox in essence arises through the 
creation of a novel “defect-defect” option for the overall system.) Greedy behavior 
on the part of individuals also results in very rich global dynamic patterns, such as 
stop-and-go waves and clusters [102, 103]. 

Much of traffic theory employs and investigates tools that have previously been 
applied in statistical physics [102, 129, 130, 183, 187] (see the preceeding section). 
In particular, the spontaneous formation of traffic jams provides a rich testbed for 
studying the emergence of complex activity from seemingly chaotic states [102, 104]. 
Furthermore, the dynamics of traffic flow is particularly amenable to the application 
and testing of many novel numerical methods in a controlled environment [16,29, 
202]. Many experimental studies have confirmed the usefulness of applying insights 
gleaned from such work to real-world traffic scenarios [102, 169, 170]. 



5 Challenge Ahead 

A collective is any multiagent system in which each agent adaptively tries to maxi- 
mize its own private utility function, while at the same time there is an overall world 
utility that rates the behavior of the entire system. Collectives are quite common 
in the natural world, canonical examples being human organizations (e.g., a com- 
pany), an ecosystem, or organelles in a living cell. In addition, as computing becomes 
ubiquitous, artificial systems that constitute collectives are rapidly increasing. Such 
systems include data networks, arrays of communication satellites, teams of rovers, 
amorphous computers, and national airspace. 

The fundamental problem in analyzing and designing such systems is in deter- 
mining how the combined actions of many agents lead to “coordinated” behavior on 
the global scale. Unfortunately, though they provide valuable insight on some aspects 
of collectives, none of the fields discussed in this survey can be modified to meet all 
the requirements of a- “field” of collectives. This is not too surprising because none 
of those fields were explicitly formed to design or analyze collectives, but rather they 
touched on certain aspects of collectives. What is needed is a fundamentally new ap- 
proach, one that may borrow from the various fields but will not simply extend an 
existing field. 

To that end, this survey provides a common language for studying collectives 
and highlights the benefits and shortcomings of the many fields related to collec- 
tives. Furthermore, it outlines some of the work ahead if a science of collectives is to 
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emerge. The types of answers that future work on collectives can and will uncover 
are difficult to predict. It is a vast and rich area of research, a new field at the intersec- 
tion of new needs and new capabilities. And although this survey does not provide 
those answers, it is our hope that it provides some of the essential questions that need 
to be addressed if the fledging field of collectives is to mature into a new science. 
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Summary. In this chapter an analysis of the behavior of an arbitrary (perhaps massive) collec- 
tive of computational processes in terms of an associated “world” utility function is presented 
We concentrate on the situation where each process in the collective can be viewed as though 
it were striving to maximize its own private utility function. For such situations the central 
design issue is how to initialize and update the collective’s structure, in particular the pri- 
vate utility functions, so as to induce the overall collective to behave in a way that has large 
values of the world utility. Traditional “team game” approaches to this problem simply set 
each private utility function equal to the world utility function. The “collective intelligence” 
(COIN) framework is a semiformal set of heuristics that have recently been used to construct 
private utility functions that in many experiments have resulted in world utility values up to 
orders of magnitude superior to that ensuing from use of the team game utility. In this chapter 
we introduce a formal mathematics for analyzing and designing collectives. We also .use this 
mathematics to suggest new private utilities that should outperform the COIN heuristics in 
certain kinds of domains. In accompanying work we use that mathematics to explain previ- 
ous experimental results concerning the superiority of COIN heuristics. In that accompanying 
work we also use the mathematics to make numerical predictions, some of which we then 
test. In this way these two papers establish the study of collectives as a proper science, in- 
volving theory, explanation of old experiments, prediction concerning new experiments, and 
engineering insights. 



1 Introduction 

This chapter concerns distributed systems, some of whose components can be viewed 
as though they were agents, adaptively “trying” to induce large values of their 
associated private utility functions. When combined with a world utility function 
that rates the possible behaviors of that system, the system is known as a collec- 
tive [17,20,23,26]. 

Given a collective, there is an associated inverse design problem: how to config- 
ure or modify the system so that in their pursuit of their private utilities the agents 
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also maximizes the world utility. Solving this problem may involve determining or 
modifying the number of agents and how they interact with each other and what de- 
grees of freedom of the overall system each controls (i.e., the very definition of the 
agents). When the agents are machine learning algorithms overtly trying to maximize 
their private utilities, the inverse problem may also involve determining or modifying 
the algorithms used by those agents, as well as precisely what private utilities each 
is trying to maximize. 

This chapter presents a mathematical framework for the investigation of collec- 
tives, in particular the investigation of this design problem. A crucial feature of this 
framework is that it involves no modeling of the underlying system or of the algo- 
rithms controlling the agents. For example, only the behavior of an agent (or, more 
precisely, certain broad aspects of it) is formally related to what private utility that 
agent is “trying” to maximize; nothing of what goes on “under the hood” is assumed. 
This behaviorist approach is crucial because in the real world, collectives are often 
so complicated that no tractable model can bear more than a cursory similarity with 
the system it is supposed to represent. More generally, this approach is crucial to 
have the framework be broad enough to encompass, for example, the collectives of 
spin glasses and human economies. 

In the next section we will introduce generalized coordinates. These allow us 
to avoid any restrictions on the kinds of variables comprising the system — they can 
be uncountable, countable, or combinations thereof; with or without an underlying 
topology or metric; and except where explicitly indicated otherwise, all the results of 
the framework apply. The underlying variables can include time or not, and if they 
do, the associated underlying dynamics is arbitrary. The variables can be broken up 
explicitly into separate agents or not, and if they are, there can be arbitrary restric- 
tions on which of the conceivable joint moves of the agents are physically allowed. 
In addition, how the variables are broken up into agents and the number of agents 
are arbitrary and can be modified dynamically (if time is included in the underlying 
variables). Moreover, if time is included as an underlying variable, then some of the 
agents can have their decision “simultaneously” fix the state of one or more variables 
of the system at distinct moments in time . (This is reminiscent of what is decided in 
settling on a contract in cooperative game theory.) Again, all of this can be varied in 
an arbitrary fashion. 

Using these generalized coordinates, a central equation can be derived that deter- 
mines how well any of these kinds of systems perform. It does so by breaking perfor- 
mance down into three terms. These terms loosely reflect the concerns of the fields 
of high-dimensional search, economics, and machine learning; the central equation 
is the bridge that couples those fields. 

The following section uses this mathematical framework to introduce a (model- 
independent) formalization of the assumption that a particular component of the sys- 
tem is a “utility-maximizing . . . agent”. That formalization is then used to derive the 
Aristocrat and Wonderful Life private utility functions, two utility functions previ- 
ously intuited that have been found to result in much better world utility than con- 
ventional techniques [17]. This derivation also uncovers (relatively rare) conditions 
under which those utilities should not perform very well. That section ends by de- 
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riving many new results, including the collapsed private utility, and ways to modify 
other agents to help a particular agent, along with specification of the scenarios in 
which such techniques should result in good world utility. 

An accompanying paper [22] presents this mathematical framework in a more 
pedagogical manner, including many examples, commentary, and some discussion 
of related fields (e.g., mechanism design in game theory). That paper also discusses 
recent experiments involving a set of previous semiformal heuristics (including the 
Aristocrat and Wonderful Life private utilities) that have been found to be very use- 
ful for the design of collectives. It uses the mathematical framework to explain the 
efficacy of those techniques. It then goes on to make numerical predictions based on 
that framework, and then presents some experimental tests of those predictions. It 
ends by making other (testable) predictions and presents a sample of future research 
topics and open issues. 

This chapter exhaustively presents all of the currently elaborated mathematics 
of the framework, including the details omitted in [22]. In particular, this chapter 
contains theorems not presented there, extensions of the theorems presented there, 
proofs of all theorems, detailed application of the framework to multistep games, and 
the important example of applying the framework to gradient ascent over categorical 
variables. (For pedagogical reasons, the latter two occur as appendices.) Combined, 
these two papers present a mathematical theory along with associated predictions, 
experiments, and engineering recommendations. In this, they lay the foundation for 
a full-fledged science of collectives. 



2 The Central Equation 

2.1 Generalized Coordinates and Intelligence 

We are interested in addressing optimization problems by decomposing them into 
many subproblems, each of which is solved separately. We will not try to choose 
such subproblems so that they are independent of one another or find a way to coor- 
dinate their solutions. Rather we will choose the subproblems so that each of them is 
relatively easy to solve, given the context of a particular current solution to the other 
subproblems , and then solve them in parallel. 

To formalize this, let f be an arbitrary space with elements z called worldpoints. 
Let C c f be the set of elements of f that are actually allowed, for example, in 
that they are consistent with the laws of physics. 1 Define a generalized coordinate 
variable as a function from C to associated coordinate values. (When the context 
makes the precise meaning clear, we will sometimes use the term “coordinate” to 
refer to a generalized coordinate variable and sometimes to a value of that variable.) 
We will sometimes view a coordinate variable p as an exhaustive partition of C 

1 Whenever expressing a particular system as a collective, it is a good rule to write out the 
functional dependencies presumed to specify C( ) as explicitly as one can, to check that 
what one has identified as the space £ does indeed contain all the important variables. 
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into nonempty subsets, with p(z) being the element of the partition that contains z. 
Accordingly, we will sometimes write a coordinate value r = p (z) as “r e p” and a 
worldpoint z! sharing that value as “z' e r.” 2 Intuitively, each “subproblem” of our 
overall optimization problem will be formalized in terms of such a partition p, as 
finding the optimal z within the r e p specified by the current solutions to the other 
subproblems. 

Often we implicitly assume that the set of values that any coordinate variable we 
are discussing can take on forms a measurable set, as does the set of worldpoints 
having any such value. (All integrals are implicit with respect to such measures.) 

As an example, C might consist of the possible joint actions of a set of computa- 
tional agents engaged in a noncooperative game [2, 3,5,7, 10]. p(z e C ) could then 
be the actions of all agents except some particular agent identified with p. In this 
case, by fixing all other degrees of freedom, the value of the coordinate p implic- 
itly specifies the degrees of freedom that are still “available to be set” by the agent 
identified with p. 

A frequently occurring type of coordinate variable is one whose values are con- 
tained in the real numbers. A particularly important example is a world utility func- 
tion G : C that ranks the possible worldpoints of the system. We are always 

provided a G; the goal in the problem of designing collectives is to maximize G. 

Our mathematics does not concern G alone, but rather its relationship with some 
coordinate utilities g p : C D3. 3 Each coordinate utility ranks the possible values 
of those degrees of freedom still allowed once the worldpoint has been restricted to 
a set of worldpoints rep. Given a set of coordinate variables, {p}, we are interested 
in inducing a z that each g p ranks highly (relative to the other worldpoints in the 
associated set r = p(z)), and in the relation between those rankings of z and G’s 
ranking of z. To analyze these issues we need to standardize utility functions so that 
the numeric value they assign to z only reflects their relative ranking of z (poten- 
tially just in comparison to the other worldpoints sharing some associated coordinate 
value). 4 

Generically, we indicate such a standardization by A, and for any utility function 
U, coordinate p, and z e C, we write the associated value of such a standardization 
of the utility U as N Pi u(z). Define “sgn[x]” to equal +1, 0, or —1 in the usual way. 
Then we only need to require of a standardization N that N Pt u(z) be a [0, l]-valued, 
p -parameterized functional of the pair (U, U{z )), one that meets the following two 
conditions as we vary U or z: 

(i) V z € C, if for a pair of utilities V and W, sgn[ W'(z') — W(z)] = sgn[V r (z / ) — 
V(z)] Vz'€ p(z), then A p , w (z) = N p , v (z). 

In general, we try to use lowercase Greek letters for coordinates and the associated lower- 
case roman letter for the value of that coordinate. 

3 In previous work, roughly analogous utilities were called “personal utilities” [17]. 

4 It turns out that there never arises a reason to consider the relation between such a stan- 
dardization and the axioms conventionally used to derive utility theory [10], in particular 
those axioms concerning behavior of expectation values of utility. 




